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ABSTRACT 

Recent research hints at an underappreciated 
complexity in pre-miRNA processing and regulation. 
Global profiling of pre-miRNA and its potential to 
increase understanding of the pre-miRNA land- 
scape is impeded by overlap with highly expressed 
classes of other non coding (nc) RNA. Here, we 
present a data set excluding these RNA before 
sequencing through locked nucleic acids (LNA), 
greatly increasing pre-miRNA sequence counts 
with no discernable effect on pre-miRNA or mature 
miRNA sequencing. Analysis of profiles generated in 
total, nuclear and cytoplasmic cell fractions reveals 
that pre-miRNAs are subject to a wide range of 
regulatory processes involving loci-specific 3'- and 
5 -end variation entailing complex cleavage patterns 
with co-occurring polyuridylation. Additionally, 
examination of nuclear-enriched flanking se- 
quences of pre-miRNA, particularly those derived 
from polycistronic miRNA transcripts, provides 
insight into miRNA and miRNA-offset (moRNA) pro- 
duction, specifically identifying novel classes of 
RNA potentially functioning as moRNA precursors. 
Our findings point to particularly intricate regulation 
of the let-7 family in many ways reminiscent of 
DICER1 -independent, pre-mir-451-like processing, 
introduce novel and unify known forms of pre- 
miRNA regulation and processing, and shed new 



light on overlooked products of miRNA processing 
pathways. 

INTRODUCTION 

Micro RNAs (miRNAs), 20-23 nt short RNAs regulat- 
ing stability and translational efficiency of transcribed 
mRNAs through complementary binding of target 
mRNA transcripts (1), are produced via transcription 
from the genome as primary (pri-miRNA) transcripts 
encoding either single or multiple (polycistronic) miRNA 
precursor hairpin-like regions, excision of hairpin regions, 
termed precursor miRNAs (pre-miRNAs) via the micro- 
processor complex containing the RNaselll enzyme 
DROSHA (2,3), transport of pre-miRNAs into the cyto- 
plasm via 3'-overhang recognition (4,5), duplex generation 
via hairpin loop removal by the RNaselll enzyme 
DICER1 (6), and selection of a single strand of the duplex 
(the 'mature' strand) for association with a member of 
the AGO family (7,8). AGO-miRNA association forms 
functional RNA-induced silencing complexes (RISC) 
which bind to and regulate mRNA transcripts. 

New research is revealing diverse regulatory pathways 
influencing levels of mature miRNA (9-13), some of which 
act directly on pre-miRNA hairpins. LIN28A binds a 
conserved nucleotide sequence motif in the hairpin loop 
region of the pre-let-7 miRNA family (14-17) acting as 
a processivity factor (18) in the untemplated addition of 
polyuridine (poly(U)) tails to the 3'-ends of pre-miRNAs 
via the ZCCHC11 enzyme, a member of the TRF family 
in the DNA polymerase (3-like superfamily of 
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ribonucleotidyltransf erases (19,20); LIN28A binding- 
induced structural changes and/or poly(U)-tailing blocks 
DICER1 uptake (21-23). Similarly, MBNL1 -binding to a 
distinct motif in the hairpin region of pre-mir-1 regulates 
miR-1 expression (24). An additional form of regulation 
was observed in AG02-mediated endonucleolytic cleavage 
~9-ll nt from the 3' pre-miRNA end (25). This cleavage 
is an essential step in the recently described DICER 1- 
independent pre-mir-451 processing pathway wherein 
the 3'-cleaved pre-mir-451 hairpin is unwound and 
subject to polyuridylation. This poly(U) tail appears to 
act as a signal for exonucleolytic degradation which 
proceeds until reduction to ~23nt, the remaining 
length likely shielded from exonuclease activity by 
AG02 (26-28). 

With emerging research suggesting that pre-miRNAs, 
far from being static intermediates in the pathway to 
mature miRNA production, are subject to diverse forms 
of regulation the need to better understand the global 
landscape of pre-miRNA sequences has increased. 
However, deep profiling of pre-miRNA sequences faces 
a substantial obstacle: the length range overlap of 
pre-miRNAs with other, far more numerous classes of 
ncRNA, including tRNA and snoRNA. Our group has 
previously successfully reduced expression of deep 
sequencing artifacts in the form of adapter-dimers, 
increasing the yield of genuine RNA sequences in a 
given library (29). Additionally, next-generation transcrip- 
tome sequencing data displays a wide range of tag expres- 
sion levels; some tags are expressed much higher relative 
to others (30,31). Synthesizing the former technique with 
the latter observation, we have developed a novel 
approach to increase pre-miRNA yield during small 
RNA library construction using locked nucleic acid 
(LNA)-based antisense oligonucleotides which specifically 
hybridize to the most abundant endogenous sequences in 
a library. The resulting data set presented here represents 
the first in vivo, complete full-length profiles of nuclear, 
cytoplasmic, and total cellular populations of human 
pre-miRNA. Analysis of this data set reveals that 
pre-miRNAs are subject to far more complex regulatory 
processes than previously realized and potentially links 
previously described aspects of pre-miRNA processing 
and regulation. 

MATERIALS AND METHODS 

Cell culture and RNA extraction 

HeLa cells were purchased from RIKEN BioResource 
Center and cultured in DMEM (Invitrogen, Carlsbad, 
CA, USA) and 10% FBS in a 5% C0 2 at 37°C. 
Cultured cells were collected, washed twice with cold 
PBS and incubated in Solution A (50 mM Tris-HCl pH 
7.5, 0.8 M Sucrose, 150mM Potassium chloride, 5mM 
Magnesium chloride, 6mM P-mercaptoethanol and 
0.5% NP-40) for lOmin on ice (32). Cytoplasmic 
extracts were cleared by centrifugation at 16000g for 
15min at 4°C and cytoplasmic RNAs were extracted 
with TRIzol LS (Invitrogen) and FastPure RNA kit 
(Takara Bio, Ohtsu, Shiga, Japan) from the extracts. 



Pellets were washed twice with Solution A, suspended 
with TRIzol (Invitrogen) followed by RNA extraction 
with FastPure RNA kit (yielding nuclear RNAs). Total 
RNAs were extracted with TRIzol and FastPure RNA 
kit as previously described (33). 

Small RNA library construction and deep sequencing 

Small RNA cDN A library were generated from 1 .2 ug of 
HeLa cell RNAs (total, cytoplasmic and nuclear fraction 
RNAs) as previously described (29) with ~2uM each 
LNA/DNA oligonucleotide (GeneDesign, Ibaraki, 
Osaka, Japan) for the most highly expressed sequences 
in the reverse transcription reaction at 47°C. Nucleotide 
sequences are shown in Supplementary Table SI. Deep 
sequencing was performed using an Illumina GAIIx 
sequencer (Illumina, San Diego, CA, USA) with a 
maximum read length of 115nt. Sequencing data are 
deposited in the DNA Data Bank of Japan (DDBJ) 
under accession number DRA000455. 

Selection of targets for LNA/DNA oligo treatment 

Targets were selected by extracting the fifty most abun- 
dantly sequenced 3'-ends in untreated libraries. The 27 
3'-ends showing the highest relative rankings across the 
three libraries were selected for targeting (Supplementary 
Data set SI). LNA/DNA oligos were designed as 
described previously (29), using the 3'-ends of the target 
RNA species as the template for hybridizing the 3'-ends of 
the LNA/DNA oligos (Figure 1). 

Analysis of libraries 

Sequences from each library were processed and mapped 
to the human genome (hgl8 assembly) using software 
provided by Illumina with standard settings. Artifact 
sequences were further filtered using the TagDust 
program (34) and quality of the data independently moni- 
tored with the SAMStat program (35). pre-miRNA ex- 
pression counts were determined by identifying sequence 
genome overlap with known miRNA loci [ver.16, 
miRBase (36)]. miRBase definitions occasionally include 
degradation products of other ncRNA instead of bona fide 
miRNA sequences (37), we detected several such se- 
quences which were likely snoRNA degradation 
products based on distributions across compartments 
and size fractions (e.g. large numbers of long reads but 
few or no reads in short fractions). These sequences 
(pre-mir-3607, pre-mir-3651, pre-mir-3647 and pre-mir- 
3653) were manually culled from our list, ensuring the 
calculated numbers were as conservative as possible. 
Given the large number of sequences that appeared to 
be polyuridylated in the data and the potential for such 
extended tails to prevent proper mapping, we filtered the 
raw data, removing extended regions at the 3'-end which 
appeared to harbor poly(U) tails. Tags with removed 
poly(U) tails were re-mapped to the genome and then 
checked for concordance with known miRNA positions 
on the genome. While this procedure extracted quite 
a few tags which then mapped to the genome, a total of 
two tags mapped to miRNA loci, indicating that this 
filtering had little effect on pre-miRNA sequence counts. 
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Figure 1. Overview of LNA/DNA targeting technique. LNA/DNA oligos are added to the RNA library preparation prior to the cDNA synthesis 
step and are designed to interact directly with RNA targets, inhibiting the reverse transcription (RT) reaction as they cannot be used as RT primers. 
Examples of some designed LNA/DNAs are provided at the bottom of the figure; the complete list is available in Supplementary Data set SI. 



Poly(U) tails were identified through successive removal 
of 3' nt of all sequenced reads not exactly matching the 
genome. When such a truncated read exactly matched the 
genome, the removed nucleotides were tested under the 
following criteria to determine if the read contained a 3' 
poly(U) tail: (i) total nucleotide length of the poly(U) tail 
was >3 and (ii) 80% of the nucleotides within the 
poly(U) tail-like regions were uridine. The second re- 
quirement is necessary as a hedge against the known 
propensity of the Illumina sequencers to introduce 
sequencing errors at the 3'-ends of long reads (38). The 
above process ensures that all identified poly(U) tails are 
(i) located at the 3'-end of the read and (ii) outside of the 
tail region, match exactly to the genome. These poly(U) 
tails are, therefore, most likely the result of 
post-transcriptional processing. In a similar manner, 
any read identified as having 3' or 5' cleavage sites 
were required to exactly match the genome. Cleavage 
sites were required to be positioned five or more nucleo- 
tides internal to 5' or 3'-ends to guard against inclusion 
of differential cutting mediated by DROSHA. 

Primary pre-isomirs, the isomirs with the largest 
number of sequence reads at individual miRNA loci, 
were identified as previously described for mature 
isomiRs (33). Pair probability calculations were also 
calculated as previously described for duplex structures 
(33,39) substituting the RNAfold program for the 
RNAcofold program to reflect the differences in 
calculating pairing probability for a single hairpin strand 
versus two strands forming a duplex and with the average 
value taken across all pair probability values calculated 
for the first 5, 10, 15, 20 or all nucleotides in the 
hairpin. Comparisons were made across the set of all 
pre-miRNA loci with 3' (or 5') cleavage events and those 
lacking any evidence of cleavage. pre-miRNA lengths 
underlying comparisons across fractions and miRBase 
hairpin definitions were calculated by weighting all 
pre-isomir lengths observed at individual loci according 



to their expression, yielding a single length for all loci. 
Lengths were compared to the set of miRBase hairpin 
lengths for which at least one tag was observed in either 
the total, cytoplasmic or nuclear fraction libraries. A 
polycistronic miRNA locus was defined as a locus with 
<200 nt between itself and a neighboring miRNA locus. 
To identify ppiRNA and fpRNA, we constructed sets of 
genome coordinates bridging the distance between the 
polycistronic miRNA loci identified above and extending 
200 nt upstream and downstream of all non-polycistronic 
miRNA loci, respectively. Large-scale genome analyses 
were carried out using bedtools and samtools software 
packages (40,41). Statistical analyses were performed 
using the R language and environment for statistical 
computing. 

RESULTS 

Enrichment in pre-miRNA sequences following 
LNA/DNA treatment 

Deep sequencing of HeLa cells in the control condition 
provided targets for LNA/DNA treatment (see 'Materials 
and Methods', Figure 1 and Supplementary Data set SI). 
ncRNA classes are visibly affected following LNA/DNA 
treatment (Supplementary Table SI and Supplementary 
Figure SI). Moreover, individual species of RNA 
targeted by LNA/DNA treatment are efficiently reduced 
(Supplementary Figure S2). Comparison of pre-miRNA 
sequences in the LNA(— ) versus LNA(+) libraries 
revealed a marked increase from a few hundred tags in 
each library to as many as 20 000 tags in the cytoplasmic 
LNA(+) library (Figure 2 A and B). In addition 
to increasing total sequence counts, the number 
of pre-miRNA loci covered also roughly doubled 
(Figure 2B). 
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Figure 2. General features of LNA(+) libraries. (A) Relative enrichment of pre-miRNA sequences in LNA(+) versus corresponding LNA(— ) library 
as a percentage of the total number of sequences within a library. (B) Table showing raw number of pre-miRNA sequences and the number of 
pre-miRNA loci with at least one sequence identified in each library. (C) Comparison of pre-miRNA sequence expression normalized to tags per ten 
million (tptm) across LNA(— ) and LNA(+) conditions in the total cell fraction, see Supplementary Figure S3 for comparisons across cytoplasmic, 
nuclear and mature fractions. (D) Comparison of pre-miRNA sequence expression (tptm) across nuclear and cytoplasmic fractions. (E) Comparison 
of LNA(+) pre-miRNA expression in the total fraction against publicly available total short-read miRNA expression in HeLa cells (42) (summing 
mature miRNA and miRNA* sequences within individual loci) normalized to the tags per million miRNA within a library (tpmm). (F) Comparison 
of length distributions across different libraries alongside miRBase reference lengths (36) ('Materials and Methods' section). 



LNA/DNA treatment does not affect miRNA expression 

Possible effects of LNA/DNA treatment on pre-miRNA 
and mature miRNA were examined by comparing 
LNA(+) and LNA(— ) sequence counts. We observed 
high correlations across each fraction (rho = 0.64-0.71) 
with increases in the LNA(+) condition (Figure 2C 
and Supplementary Figure S3A). We also observed high 
correlation across mature libraries (rho = 0.90), suggest- 
ing that LNA/DNA treatment does not affect mature 
miRNA sequencing (Supplementary Figure S3B). 

Little correlation was observed across cytoplasmic 
and nuclear compartments. Notably, LNA/DNA treat- 
ment increases the dynamic range across the two compart- 
ments, clarifying relationships between locus counts in 
nuclear and cytoplasmic compartments (Figure 2D, 
Supplementary Figure S4). Little correlation was also 
observed when comparing total cellular pre-miRNA loci 
counts to total mature miRNA counts (42) (Figure 2E), 
likely related to some combination of the following three 
factors: (i) the influence of different regulatory pathways 
on pre-miRNA processing (see below), (ii) misannotation 
of sequences as pre-miRNA hairpins in miRBase (37) and 
(hi) differences in the relative sequencing depth between 
the two populations. Sequence counts for pre-miRNA 



and mature miRNA loci are provided in Supplementary 
Tables S2-S4, alignments of all sequences to pre-miRNA 
loci in Supplementary Data set S2. 

General features of sequenced pre-miRNA 

While the composition and lengths of mature miRNA are 
well-characterized through deep sequencing and more 
targeted approaches, the precise genome boundaries of 
pre-miRNA precursors are difficult to unequivocally 
establish, particularly in cases where mature expression 
from one arm of the pre-miRNA is low (43). Our 
approach enables precise definitions for such transcripts 
(Supplementary Data set S2). On a global scale, lengths of 
sequenced pre-miRNAs were collected and compared 
across compartments and with corresponding miRBase 
hairpin definitions (Figure 2F; 'Materials and Methods' 
section). Our data indicate that dispersion of pre- 
miRNA lengths is tightly clustered around 60 nt, with no 
significant length differences observed across compart- 
ments. miRBase definitions were longer in median length 
(86 nt) and considerably more widely dispersed (Figure 
2F). To elucidate discrepancies between miRBase and 
sequenced pre-miRNAs lengths, we mapped positions 
of sequenced pre-miRNAs relative to miRBase start and 
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end positions. The majority of sequence 5' start sites in 
our data were within 5 nt of the miRBase-defmed start site 
while 3'-ends display an unusual gradual increase with the 
majority of reads located within 10-15 nt of the miRBase 
end site (Supplementary Figure S5). 

Mature animal miRNAs display substantial 5'/3'-end 
heterogeneity as revealed by deep sequencing (31,43,44) 
and detailed biochemical probing of pre-miRNA struc- 
tures (45,46), yielding multiple distinct sequence 
'isomiRs' from a single locus. While the DICER 1 
enzyme contributes substantially to this heterogeneity 
through differential cleavage, DROSHA also plays a 
role in enhancing heterogeneity at the 3' terminus of 
mature miRNAs derived from the 3' arm of the hairpin 
structure (31,43—45,47). We calculated positional variation 
at the 5' and 3'-ends of all unique sequenced pre-miRNAs 
from the total cell fraction using the most frequently 
sequenced pre-isomir (hereafter the 'primary pre-isomir') 
as a reference and compared this with mature miRNA 



variation (42) (Figure 3A). Similar heterogeneity is 
observed in pre-miRNA and mature miRNAs when con- 
sidering all unique isomiRs; however, when comparing 
only isomiRs mapping exactly to the genome (thereby 
removing effects of nucleotidyltransferase-mediated 3' 
addition events (33,48,49), heterogeneity in pre-miRNA 
sequences decreases (Figure 3A). The same trend is 
evident in the set of all sequenced tags (Supplementary 
Figure S6), suggesting modifications following 
DROSHA-mediated cleavage contribute to pre-miRNA 
end heterogeneity. Consistent with this, a distinctive 'tail' 
is observed in the region downstream of the 3'-end of 
pre-isomirs (Figure 3A). Comparison of end heterogeneity 
across nuclear and cytoplasmic compartments revealed 
slightly greater heterogeneity at 3'-ends stemming from 
post-cleavage modifications likely occurring in the cyto- 
plasmic fraction (Supplementary Figure S7). 
Heterogeneity at the 3'-ends of unique mature miRNA 
sequences derived from only the 3'-pre-miRNA arm 
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Figure 3. Analysis of pre-miRNA sequence features. (A) Analysis of heterogeneity at the 5' (left side) and 3' (right side) ends of pre-miRNA relative 
to mature miRNA considering unique sequences in total cellular fractions. Proportions of sequences in a given library are plotted against the location 
of their 5' and 3' ends relative to the primary pre-isomir (pre-miRNA) or primary isomiR (mature miRNA) normalized to the zero point in all line 
charts. Negative numbers refer to positions internal to the pre-miRNA hairpin. The top plot shows proportions for all unique sequences in the 
libraries, the bottom charts show proportions for all exactly mapping unique sequences in the libraries. A black box highlights the extended region of 
3' end variation resulting from poly(U)-tailing when examining all sequences (top right) and the lack of this feature when examining only exactly 
matching sequences (bottom right). (B and C) Plotting the proportion of nucleotide mismatches in pre-miRNA sequences from the total cellular 
fraction at labeled positions around a zero point normalized to (B) the 3' end of the primary pre-isomir and (C) the miRBase-defined 3' end point of 
the mature or miRNA* sequence derived from the 3' arm of the pre-miRNA hairpin. (D and E) Proportion of sequences with poly(U) tails (D) and 
poly(U) tail length distributions (E) in each cellular fraction. (F) List of loci with identified poly(U) tails, divided into loci with the LIN28A 
recognition motif in the experimentally determined relevant location of the pre-miRNA hairpin and those lacking such a motif. ' A ' denotes loci 
containing poly(U) tails at the miRBase-defined 3' end. (G) Proportion of poly(U) tails occurring across the two sets of loci defined in (F) in each 
cellular fraction. 
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cannot be entirely explained by DROSHA cutting, 
suggesting contribution from unidentified nucleases 
(Supplementary Figure S8). 

Over-representation of uridine mismatches 

To investigate the source of heterogeneity in Figure 3A, 
we systematically tallied nucleotide mismatches to the 
genome for pre-miRNA sequences at positions surround- 
ing the 3' terminus of the primary pre-isomir. A clear over- 
representation of uridine mismatches extends roughly 
eight basepairs downstream of the 3' terminus of the 
primary pre-isomir (Figure 3B), suggesting the presence 
of 3' terminal poly(U)-tailing. However, when setting the 
3' terminus for each pre-miRNA locus as the last base in 
miRBase-defmed 3' hairpin-derived miRNA (mature or 
miRNA*) sequence, we surprisingly observed a dramatic 
shift in uridine mismatches to positions upstream of the 
3'-end (Figure 3C), suggesting that poly(U)-tailing events 
detected in the libraries are primarily internal relative to 
canonical, miRBase-defmed pre-miRNA hairpin struc- 
tures and the primary pre-isomir of at least some loci 
have truncated 3' arms. 

Characterization of widespread poly(U)-tailing 

Further analysis of poly(U)-tailing determined 18-20% of 
all tags in total and cytoplasmic fractions and 10% in the 
nuclear fraction harbored poly(U) tails (Figure 3D; 
'Materials and Methods' section). Length distributions 
of poly(U) tails were similar across all three fractions, 
centered on 5-7 nt (Figure 3E); these lengths are more 
consistent with poly(U)-tailing observed following 
AG02-cleavage in pre-mir-451 (26-28) than with the 
~14nt poly(U) tails found at 3' termini of let-7 family 
pre-miRNAs (14). Polyuridylation affects two groups of 
loci: (i) LIN28A-binding motif-containing pre-miRNAs 
(see above) and (ii) LIN28A binding motif lacking 
pre-miRNAs (Figure 3F, Supplementary Table S5). The 
relative concentration of poly(U) tails in loci belonging to 
the former group is significantly higher than loci belonging 
to the latter group (Figure 3G) indicating poly(U)-tailing 
is concentrated in LIN28A-binding motif-containing 
pre-miRNAs. 

Poiy(U) tailing and 3' cleavage 

With the bulk of poly(U) tails originating from points 
internal to the 3' hairpin terminus, we postulated 
poly(U) tails may be related to 3'-arm nuclease activity. 
An analysis of all pre-miRNA tags revealed exceptionally 
high rates of probable 3' nuclease-mediated cleavage: 44% 
in the total fraction, 46% in the cytoplasmic fraction and 
19% in the nuclear fraction (Figure 4A). We examined 
the relationship between 3' cleavage events and poly(U)- 
tailing in three sets of pre-miRNAs: LIN28A-binding 
motif-containing, pre-miRNAs with poly(U)-tailing but 
lacking canonical LIN28A recognition motifs (see 
below), and pre-miRNA with 3' cleavage events lacking 
poly(U)-tailing (see Supplementary Discussion). 

LIN28A binding motif-containing pre-miRNA. LIN28A 
associates with a GGAG' sequence motif positionally 



restricted to the 3'-end of the hairpin loop structure of 
pre-miRNAs (18,21). This motif is conserved across the 
let-7 family and is found in several other pre-miRNAs 
(18). Of the known pre-miRNAs harboring this motif, 
only let-7 family members were expressed in our libraries; 
let-7 family 3'-end positions were plotted relative to unique 
origin sites of poly(U) tails in all three cellular fractions 
(total cell in Figure 4B, nuclear, cytoplasmic fractions in 
Supplementary Figure S9). In addition to the expected 
peak at the 3' terminus, a striking periodicity is observed 
in 3'-end positions with peaks centered at —10, —20 and 
—30 nt positions internal to the pre-miRNA structure with 
evidence of 'tiling' between the peaks (Figure 4B, 
Supplementary Figure S9 and S10). Remarkably, the dis- 
tribution of poly(U) tail origin sites mirrors this pattern of 
periodicity and tiling with one exception: poly(U) tail for- 
mation is rarely observed prior to the — lOnt position 
(Figure 4B, Supplementary Figure S9 and S10). The peak 
centered at —10 nt is consistent with in vivo cleavage 
mediated by AG02 sheer activity [indeed, pre-let-7a is spe- 
cifically targeted by in vivo AG02-mediated 3' cleavage 
(25)] suggesting that AG02-like cleavage events precede 
internal poly(U) tail formation. 

Potential sources of the second and third cleavage peaks 
are less clear. Intriguingly, mapping end positions directly 
onto let-7 family structures reveals the second peak is 
positioned just downstream of the LIN28A binding site 
(Figure 4C, Supplementary Figure Sll). While LIN28A 
may not be highly expressed in HeLa cells, a similar RNA- 
binding factor could block pre-miRNA hairpins from exo- 
nuclease activity beyond the — 20 nt position; disassoci- 
ation of such a factor could contribute to the tiling and 
peaks observed around the — 30 nt position. Importantly, 
while the —30 nt peak is clearly identified when examining 
unique sequences in the library, it is infrequent in the 
context of the complete set of library tags, 
indicating that these cleavage and poly(U)-tailing events 
are rare relative to AG02-mediated cleavage events 
(Supplementary Figure S9-S10). However, comparison 
across nuclear and cytoplasmic compartments reveals 
that a striking number of poly(U) tails originate at the 
— 20 nt position in the nuclear fraction (Supplementary 
Figure S10, Discussion). 

pre-miRNA with poly(U)-tailing and no LIN28A-like 
recognition motif 

Several loci lacking LIN28A recognition motifs in the 
hairpin loop contain 3' cleavage and poly(U)-tailing. The 
frequency of these events is lower relative to LIN28A- 
binding motif-containing pre-miRNAs (Figure 3F and 
G, Supplementary Table S5) but display similar period- 
icity (Supplementary Figure S10 and SI 2) suggesting that 
the cleavage/polyuridylation patterns are controlled by a 
less-efficient version of the same regulatory process. 
Mutations to the LIN28A recognition site weaken but 
do not ablate LIN28A association with let-7 family 
members (18) and LIN28A binds to and regulates expres- 
sion of pre-mir-1, which lacks the canonical binding site 
(24), suggesting that LIN28A or a LIN28A-like factor 
could contribute to the observed events. 
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Figure 4. Concomitant pre-miRNA cleavage and polyuridylation. (A) Frequency and enrichment of sequences with 3' nuclease activity, compared 
with polyuridylation events. (B) Histogram plotting the proportions of 3' ends of unique sequences and unique sites of polyuridylation initiation in 
LIN28A binding motif-containing pre-miRNA sequences, revealing concomitant periodicity at —10, —20 and — 30 nt peaks. Zero point in the 
histogram refers to 3' end of the mature miRNA/miRNA* sequence derived from the 3' pre-miRNA arm defined by miRBase. Negative values 
refer to points internal to the pre-miRNA hairpin. See Supplementary Figures S9-10, S12 for comparisons involving set of all sequences and across 
nuclear and cytoplasmic fractions. (C) Predicted hairpin structure of pre-let-7b with barplot representing the total number of raw counts in the total 
cellular fraction with 3' cleavage events (green) and polyuridylation initiation sites (purple). LIN28A recognition site is colored in red (see also 
Supplementary Figure Sll). (D) Boxplot comparing average nucleotide pairing probability for set of pre-miRNAs with 3' cleavage events ('3'C') 
against those lacking cleavage events ('NC). While little difference is observed in the average pairing probabilities for the first five nucleotides when 
counting from the 5' base of the stem, as more nucleotides are included in the calculations the differences are significant (at 15 nt, Wilcox rank sum 
test, P = 0.0048; at 20 nt, P = 0.0024). (E) Histogram depicting proportion of 5' cleavage events at given locations across all pooled libraries, 
revealing clear peak in the 20-23 nt range (see also Supplementary Figure S14 and Table S6). 



AG02-mediated 3' cleavage events have been linked to 
base-pairing in the initial nucleotides of the pre-miRNA 
stem (25) and highly complementary base pairing along 
the stem of the hairpin miRNA structure (26-28). We 
failed to uncover evidence of bias in initial base-pairing 
but comparison to the set of loci lacking 3' cleavage events 
supports a role for general complementary base-pairing 
along the hairpin stem (Figure 4D). 

A handful of pre-miRNA sequences in this group show 
genuine 3' poly(U) tails including pre-mir-21, 106b, 15a, 
1307 and 1226 (Supplementary Table S5). It is possible 
that these tails are involved in blocking DICER 1 
uptake; however, given none of these hairpins contain 
properly positioned LIN28A or other conserved recogni- 
tion motifs (21-23), poly(U) tails could instead be 
involved in distinct regulatory processes. For example, 
pre-mir-1226 is a mirtron; mirtrons are excised as 
introns from precursor mRNA transcripts and fold into 
hairpin structures independent of DROSHA processing 



(50-52). pre-mir-1226 is predicted to fold into a hairpin 
structure with a rare 5' overhang; in this case 3' poly(U)- 
tailing could provide the 3' overhang necessary for cyto- 
plasmic export (Supplementary Figure SI 3) (4,5). An add- 
itional 3' tail observed in pre-mir-1307 is the only 
identified instance of a poly(A) tail (Supplementary 
Figure SI 3). 

5' pre-miRNA cleavage 

We also observe 5' cleavage events scattered across 
a diverse set of pre-miRNA loci. 3' cleavage events 
associated with internal polyuridylation almost exclusively 
occur in pre-miRNA loci giving rise to mature miRNA 
from either the 5' arm or both arms; 5' cleavage events 
occur in pre-miRNA giving rise to mature miRNA from 
both 5' and 3' hairpin arms (Supplementary Tables S5 and 
S6). Mapping the distribution of 5'-ends of all unique tags 
pooled from all three fractions reveals a distinctive peak in 
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the range of — 20-23 nt internal to the pre-miRNA 
hairpin, but no peak in the — lOnt region (Figure 4E, 
Supplementary Figure S14). Of the 93 (73%) 5' cleavage 
events in the —20-23 nt range, 68 are derived from a single 
pre-miRNA locus, pre-let-7i; 23% of all tags at the 
pre-let-7i locus undergo 5' cleavage (Supplementary 
Data set S2) matching the 5' lengths of mature let-7i 
miRNA. Generation of single-nick pre-miRNA hairpin 
products via recombinant DICER 1 has been independent- 
ly observed in vitro by two groups (53,54); the above 
observations suggest in vivo processing at a limited 
number of loci. The unusual hairpin loop structure 
harbored by let-7i may contribute to high levels of poten- 
tial single-nick DICER 1 processing (Supplementary 
Figure SI 5) which in turn explains distinctive pre-let-7i 
expression patterns relative to other let-7 family 
members (Figure 2E). 

pre-mir-21 also undergoes considerable 5' cleavage in 
the —5 to — lOnt range (Supplementary Table S6, 
Supplementary Data set S2). miR-21 is derived from the 
5' arm, indicating that such cleavage could affect mature 
miRNA production. Inspection of deep-sequenced mature 
miRNA data (42) suggests that this cleavage persists in 
processed products, indicating that these 5'-shortened 
hairpins are substrates for DICER 1 processing. 
Investigation of the functional effects and molecular 
basis of this 5' hairpin cleavage, which appears specific 
to pre-mir-21, may increase understanding of the onco- 
genic role of miR-21 in various cancer types (55). 

Sequences flanking pre-miRNA loci 

As pre-miRNAs are typically processed from pri-miRNA 
precursors, investigation into regions surrounding 
pre-miRNA sequences in LNA(+) libraries can provide 
insight into miRNA biogenesis. Related to this, recent 
research has identified a novel class of small RNAs 
(18-23 nt in length) located immediately downstream 
and upstream of pre-miRNA hairpins, termed miRNA- 
offset RNA (moRNA). Accumulating evidence suggests 
that moRNAs are produced through directed production 
pathways (56,57) and are not merely byproducts of 
miRNA biogenesis. Two proposals have been suggested 
for moRNA production: (i) double-stranded cleavage 
of extended hairpin regions on pri-miRNA transcripts 
via secondary DROSHA1 processing (56) and (ii) 
exonucleolytic activity on precursor transcripts (58,59). 

Flanking pre-miRNA (fpRNA) sequences (~60-115nt 
in length) are less abundant than total pre-miRNA 
sequences and are enriched in the nucleus, consistent 
with ratios of moRNAs relative to mature miRNAs and 
compartmental moRNA enrichment (57), suggesting a 
possible link between fpRNA and moRNA processing 
(Figure 5A-C, Supplementary Table S7, Figure S16A). 
Interestingly, while moRNAs are strongly biased for 
derivation from the 5' region of pre-miRNAs (57) 
(Supplementary Figure SI 7), fpRNA display a converse 
bias for derivation from the 3' region (Figure 5C) possibly 
reflecting increased mo RNA processing efficiency in 
5'-derived fpRNA. Similar to previous research, we 
observe no correlation between moRNA and associated 



mature miRNA transcripts (60); this extends to fpRNA 
and associated pre-miRNA sequences (Supplementary 
Table S7). 

moRNA sequence abundance was significantly higher 
in pre-miRNAs derived from polycistronic miRNA 
sequences (Figure 5D, 'Materials and Methods' section 
and Supplementary Figure S16B). Reads flanking 
polycistronic sequences bridge pre-miRNA loci and there- 
fore cannot be assigned to 5'- or 3'-ends of any single 
pre-miRNA locus. Considered independently of fpRNA 
sequences, polycistronic pre-miRNA intervening RNA 
(ppiRNA) (~50-115nt in length) are also enriched in 
the nuclear fraction; however, unlike fpRNA, ppiRNA 
is substantially more abundant than associated 
pre-miRNA sequences (Figure 5E and 5A, Supplementary 
Figure S16B, Table S8). The high observed ppiRNA 
sequence counts possibly result from the unique struc- 
ture/sequence features of the intermediates furnished 
by processing of polycistronic pri-miRNA transcripts 
including the lack of a 5' cap structures or 3' poly(A) 
tails, which would tend to be present in the fpRNA. 

To gain further insight into moRNA processing, hetero- 
geneity within individual moRNA loci was examined. 
moRNAs derived from 5' and 3' regions flanking 
pre-miRNAs display broad length distributions incon- 
sistent with a DROSHA-based cleavage mechanism 
(Figure 5F and G). We analyzed the ends of fpRNAs 
corresponding to DROSHA cleavage sites, reasoning 
that targeted moRNA production would result in 
moRNA-removed fpRNA intermediates beginning 
18-23 nt downstream of fpRNA start sites (Figure 5H 
and I). DROSHA cleavage sites display variation consist- 
ent with 3—5' exonuclease activity in fpRNA derived from 
the 5' region of the pre-miRNA transcript (Figure 5H). 
Similarly, fpRNA derived from the 3'-end of pre-miRNA 
transcripts in the nuclear fraction appear subject to 
low-level 5'— 3' exonucleolytic activity (Figure 51). 
fpRNA ends not affected by DROSHA were also 
examined; 5'-ends (presumed sites of transcription initi- 
ation) show little variation while 3'-ends appear subject 
to exonucleolytic activity (Supplementary Figure S18A 
and B). Similar analyses of ppiRNA revealed no clear 
signal indicative of moRNA processing (Supplementary 
Figure S18C and D). 

Shared nuclear compartment enrichment and com- 
parable moRNA/miRNA and fpRNA/pre-miRNA 
enrichment levels suggest that fpRNA sequences may act 
as precursors for moRNA production (Figure 5A and B). 
However, several observations argue that moRNA pro- 
duction may be more complex than previously thought: 
(i) the persistence of fpRNA derived from the 3'-end of 
the pre-miRNA (Figure 5C) suggests that the 5' arms 
(from which the bulk of moRNAs are produced) could 
be processed independent of 3' arms and thus independent 
of double-stranded cleavage mechanisms, (ii) The 
observed enrichment of moRNAs flanking pre-miRNA 
loci belonging to polycistronic transcripts (Figure 5D, 
Supplementary Figure 14) and the enrichment of 
ppiRNAs relative to their flanking pre-miRNA sequences 
(Figure 5E) could suggest that moRNAs from poly- 
cistronic and non-polycistronic transcripts are produced 
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Figure 5. fpRNA and ppiRNA in relation to moRNA. (A and B) Relative abundance of fpRNA and moRNA to their cognate pre-miRNA and 
mature miRNA sequences. (C) Slight enrichment is observed in fpRNA located adjacent to 3' arm of the pre-miRNA in both nuclear and cyto- 
plasmic fractions, tpm normalized to facilitate comparison across the fractions. (D) Boxplot depicting the significant enrichment (Wilcox rank sum 
test, P = 9.2e~ 5 ) of moRNA sequences derived from polycistronic pri-miRNA transcripts. (E) Enrichment of ppiRNA sequences in both the nuclear 
fraction and relative to pre-miRNA sequences derived from polycistronic pri-miRNA transcripts, tpm normalized. (F and G) Line plots depicting 
proportion of moRNA sequences relative to distances from the inferred DROSHA cleavage point for moRNA derived from polycistronic (brown) 
and non-polycistronic (green) pri-miRNA transcripts from regions adjacent to the 3' (F) and 5' (G) ends of pre-miRNA hairpins. Little difference is 
observed, suggesting similar modes of processing while the range of lengths in moRNA sequences argues against consistent cleavage points. (H and I) 
Line plots depicting proportions of fpRNA sequences with 3' (orange) or 5' (black) edges remaining at the indicated distances from the 
most-frequently occurring, inferred DROSHA-mediated cleavage point separating pre-miRNA from fpRNA sequences in the cytoplasmic (H) and 
nuclear (I) fractions. (H and I) are both indicative of possible 3'-5' exonucleolytic processing likely unrelated to moRNA production. 



through distinct pathways, (iii) The wide range of lengths 
observed in 'mature' moRNA sequences in this (Figure 5F 
and G) and other research (57) argue against a strictly 
measured cleavage mechanism, (iv) The puzzling lack of 
observed intermediates in fpRNA and particularly 
ppiRNA corresponding to the expected lengths of 
moRNA sequences and the possibility of exonuclease 
activity in some locations and cellular fractions (Figure 
5H and I, Supplementary Figure SI 5). This final point, 
however, is hampered by lack of sequencing depth: it is 
possible intermediates derived from targeted endonuclease 
activity are escaping detection, particularly given the low 
abundance of moRNA transcripts relative to mature 
miRNA counterparts (Figure 5B). The DROSHA- 
mediated cleavage argument for moRNA production has 
been bolstered by computational meta-survey of pooled 
small RNA libraries in Drosophila (61) and more 
targeted studies in the basal chordate Ciona (56). 
Further characterization, both computational and experi- 
mental, will be required to sort out the roles of different 



nucleases and possible differences between the mammalian 
and other animal moRNA biogenesis pathways. 
The striking stability of ppiRNA could indicate additional 
functions beyond an moRNA precursor role. 

DISCUSSION 

Here, we present the first complete pre-miRNA deep 
sequencing profiles in total, nuclear and cytoplasmic 
HeLa cellular fractions. These data sets form an invalu- 
able resource for understanding global trends in 
pre-miRNA regulation. The analysis presented here and 
summarized in Figure 6 reinforces the coalescing notion 
that pre-miRNA hairpins are subject to a wide range of 
diverse and targeted regulatory processes (14-17,24,25). 
Notably, the data (i) connect previously described 
AG02-catalyzed pre-miRNA 3' hairpin cleavage (25) to 
polyuridylation and concomitant exonuclease activity, 
(ii) suggests that the hairpins targeted by AG02 
cleavage are intricately tied to interaction with potential 
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via the splicing machinery may result in hairpins with 5' overhangs (Supplementary Figure SI 3). 3' polyuridylation could provide 3' overhangs for 
cytoplasmic export, (vii) Some pre-miRNA loci are subject to 5' cleavage, possibly processed by single-nick DICER1 activity (Supplementary Figure 
S15). 



RNA-binding factors like LIN28A, and (iii) provides 
evidence linking a broad set of miRNA loci to the 
pre-mir-451-like, DICER 1 -independent processing 
pathway, raising the possibility that in several loci an add- 
itional pathway exists for production of the RNA compo- 
nent associating with AG02 during RISC formation. 
Given the conservation of LIN28 in the crown group eu- 
karyotes (62), it is possible that this pathway is active in 
both animal and plant miRNA processing. The funda- 
mental evolutionary implication, however, is that RNAi 
could be widely active in only the presence of an AGO 
family member (19) and an RNA hairpin, consistent with 



the observation that AGO-centric RNAi is remarkable in 
its ability to incorporate various RNA and DNA sources 
derived from distinct processing pathways as guide strands 
which target mRNA across all three superkingdoms of 
Life (63-71). However, as the specific AGO clade of the 
AGO superfamily is a later innovation in eukaryotes (as 
opposed to the more ancestral PIWI clade) (62,72), the 
complete phyletic distribution of such an AGO-RNA 
hairpin processing pathway remains unclear. In addition 
to providing insight into processing pathways 
incorporating AGO-based cleavage, the data presented 
in this manuscript provide insight into moRNA 
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production and suggests the presence of novel hairpin 
regulatory pathways including (i) mirtron hairpin process- 
ing through poly(U)-tailing, (ii) 5', single-strand 
endonucleolytic cleavage in generation of mature 
miRNAs and (iii) exonucleolytic activity on the 5'-end of 
hairpins (Figure 6). 

Reduction of LIN28A has previously been shown to 
increase mature let-7 family miRNA expression (21,22) 
and decrease substrate miRNA target expression (22) 
through a proposed pathway entailing let-7 family 
hairpin-binding, polyuridylation at the 3' terminus, and 
rapid degradation (14,21). Our data suggest the presence 
of an additional contributing pathway wherein a factor 
like LIN28A, instead of binding directly to the 
pre-miRNA hairpin, binds to the hairpin after or in con- 
junction with AG02-mediated 3' pre-miRNA cleavage 
(Figure 6, Supplementary Figure S19). Reduction of the 
factor would facilitate exonucleolytic degradation of the 
hairpin similar to miR-451 down to the length range (20- 
23 nt) protected by AG02 association, increasing mature 
let-7 expression and decreasing target expression. 
Intriguingly, comparison of length distributions of 
mature miR-451, let-7 family, and miRNA not affected 
by 3' cleavage identifies a significant difference in the dis- 
tributions of mature-451 and miRNA not affected by 3' 
cleavage but not with the let-7 family, suggesting that 
small amounts of let-7 could be processed in a manner 
similar to mir-451 (Supplementary Figure S20; 
'Materials and Methods' section). These distributions 
appear consistent with lengths of miRNA observed in 
DICER 1 -knockout mice models (73). Alternatively, 3' 
cleavage/poly(U)-tailing may divert pre-miRNA from 
functional RNAi activity as a kind of hairpin 'sequester- 
ing'; a function with the same end result as a signal for 
degradation (Figure 6). Analyzing the effects of LIN28A 
and DICER 1 knockdown on mature and hairpin 
deep-sequence profiles could discriminate between these 
possibilities. 

Extreme 3' terminus poly(U)-tailing was not observed in 
large quantities. Two possible reasons for this absence are 
as follows: (i) polyuridylation renders the pre-miRNA 
transcripts extremely unstable and/or (ii) the sequencing 
depth of our libraries remains too shallow to detect 3' 
pre-miRNA poly(U) tailing. This second reason is particu- 
larly attractive given similar difficulties in observing 
single, 3' U addition events and given that these two 
processes could be interrelated (33,74) (Supplementary 
Discussion). Additionally, it is important to note the se- 
quences with poly(U) tails which are detected in large 
quantities in these libraries could be stabilized through 
AG02 interaction. 

In summary, this data and analysis open new avenues of 
research into understanding pre-miRNA regulatory 
processes. The lengths and sequence composition of 
many of the processed pre-miRNAs outlined in Figure 6 
would preclude detection via traditional deep sequencing 
and miRNA amplification methods, which could have im- 
portant ramifications for common laboratory techniques 
probing siRNA pathways through DICER knockdown. 
Of additional specific interest are potential relationships 
of the findings to let-7 family-mediated tumor suppression 



and miR-21 -mediated tumorigenesis, disruption of an al- 
ternative hairpin processing pathway (Figure 6) or an 
introduced imbalance between this pathway and the 
standard processing pathway could dramatically alter 
let-7 and miR-21 roles in maintenance of cellular stability. 
Future improvements to the pre-miRNA yield of this 
experimental technique will assist in investigation of 
other areas of pre-miRNA regulation requiring deeper 
coverage including editing, single nucleotide addition 
(Supplementary Discussion), and extreme 3' poly(U)- 
tailing. 
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NOTE ADDED IN PROOF 

Since submitting this manuscript, Newman and colleagues 
(75) published a paper reporting a 3' primer-based 
amplification method for pre-miRNA sequencing. While 
this method precludes detection of 5' pre-miRNA 
variation and the surrounding sequences presented here, 
the reported 3' variation largely agrees with results 
presented here. Two exceptions are 1) their sequencing is 
capable of detecting greater 3' end poly(U) tailing; likely 
related to differences in sequencing depth discussed above. 
2) They detect high levels of single/double U addition at 
several loci which we do not observe. This may point to 
cell-specific variation, possibly influenced by relative 
expression of LIN28A or functionally related factors. It 
also supports a mechanistic demarcation between initial 
uridylation and processive formation of 3' poly(U) tails; 
again likely dependent on expression of processivity 
factors (see Supplementary Discussion). 
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